An abnormal metabolism-related gene, ALG3, is a potential diagnostic and prognostic biomarker for lung adenocarcinoma

Background: To explore the abnormal metabolism-related genes that affect the prognosis of patients with lung adenocarcinoma (LUAD), and analyze the relationship with immune infiltration and competing endogenous RNA (ceRNA) network. Methods: Transcriptome data of LUAD were downloaded from the Cancer Genome Atlas database. Abnormal metabolism-related differentially expressed genes in LUAD were screened by the R language. Cox analysis was used to construct LUAD prognostic risk model. Kaplan–Meier test, ROC curve and nomograms were used to evaluate the predictive ability of metabolic related gene prognostic model. CIBERSORT algorithm was used to analyze the relationship between risk score and immune infiltration. The starBase database constructed a regulatory network consistent with the ceRNA hypothesis. IHC experiments were performed to verify the differential expression of ALG3 in LUAD and paracancerous samples. Results: In this study, 42 abnormal metabolism-related differential genes were screened. After survival analysis, the final 5 metabolism-related genes were used as the construction of prognosis model, including ALG3, COL7A1, KL, MST1, and SLC52A1. In the model, the survival rate of LUAD patients in the high-risk subgroup was lower than that in the low-risk group. In addition, the risk score of the constructed LUAD prognostic model can be used as an independent prognostic factor for patients. According to the analysis of CIBERSORT algorithm, the risk score is related to the infiltration of multiple immune cells. The potential ceRNA network of model genes in LUAD was constructed through the starBase database. IHC experiments revealed that ALG3 expression was upregulated in LUAD. Conclusion: The prognostic model of LUAD reveals the relationship between metabolism and prognosis of LUAD, and provides a novel perspective for diagnosis and research of LUAD.


Introduction
Lung cancer is a malignant tumor originating from the bronchial mucosa or glands, among which lung adenocarcinoma (LUAD) is 1 of its important tissue subtypes, with incidence and mortality at a high level. [1]However, the etiology has not yet been fully clarified, and most of them are believed to be closely AR and XC have equally contributed to this paper.

Natural Science Foundation of Xinjiang Uygur Autonomous Region, Grant No.: 2022D01F10; Analysis of the causes of tuberculosis high incidence in Kashgar region and key technology development for artificial-intelligence-based discrimination of imaging big data, Grant No.: 2022B03032-1; KaShi Sci-Tech
Plan Project, Grant No.: KS2023012.related to atmospheric pollution, hereditary factors, and smoking. [2]Heterogeneity of LUAD poses a challenge for predicting the prognosis of patients.Clinical manifestations, morphology, molecular features, therapeutic effects and prognosis vary widely from case to case.Clinically, tumor stage, histological grading, and molecular subtype are commonly used as prognostic factors for LUAD.However, the ability of clinical features to predict LUAD prognosis is limited.An inaccurate grasp of the prognostic risk of LUAD patients may lead to overtreatment of low-risk patients and inappropriate treatment of high-risk patients. [3]Therefore, it is of critical importance to construct a reliable and clinician-friendly prognostic model of lung adenocarcinoma for targeted interventions.
Metabolic changes are 1 of the hallmarks of tumor cells, and it is generally believed that this is achieved through increased aerobic fermentation. [4]Metabolic reprogramming represents cancer-related metabolic changes during tumorigenesis, and therapeutic approaches targeting this mechanism of tumor metabolic reprogramming are currently being explored. [5]Within tumor cells, many metabolic changes are generated to meet the energy and synthetic requirements of tumors, which has become an important feature of tumors. [6]In this study, a model combining multiple cancer metabolism-related genes was constructed to predict the prognostic risk of LUAD patients.Through differential analysis of metabolism-related genes in the LUAD Cancer Genome Atlas Program (TCGA) database, metabolism-related differential genes were screened, and a prognostic model was constructed through Cox regression to find therapeutic targets, predict the prognosis of LUAD patients, and formulate an individualized diagnostic and therapeutic plan.

Material
LUAD transcription data involved in this research were from TCGA database, including 515 LUAD and 59 normal samples.The GSE50081 data set was downloaded from the Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo/).The GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array) was used as the platform information, including 127 lung adenocarcinoma samples with prognostic information.Abnormal metabolism-related genes (153 in total) were downloaded from MSigDB database in Gene Set Enrichment Analysis web page (Supplementary data S1, Supplemental Digital Content, http://links.lww.com/MD/N316).

Acquisition of DEGs related to metabolism
After normalizing transcription data with the "edgeR" package, [7] the differentially expressed genes (DEGs) in LUAD were screened.False discovery rate (FDR) < 0. 05 and |log2 fold change| >1 were considered to be significantly differentially expressed.Wayne map was used to extract genes related to abnormal metabolism in DEGs.In this study, the STRING database was utilized to explore the interactions between differentially expressed abnormal metabolism-related genes.

Functional enrichment analysis
The "clusterprofiler" package [8] was used for GO function enrichment and KEGG pathway analysis.The "goplot" package is used to calculate z-score.The significantly enriched GO and KEGG results are displayed by chord graph.

Construction of prognostic risk model
"Survival" package is used to screen abnormal metabolic genes related to prognosis in LUAD.The prognostic related metabolic genes were included in the multivariate analysis to obtain the LUAD prognostic risk model.The patient risk score was calculated according to the multiplication of gene coefficient and gene expression in LUAD model.According to the median of the risk score, the LUAD patients were divided into high-risk group and the low-risk group.Kaplan-Meier survival method was used to analyze the survival of high-risk group and the low-risk group, and ROC curve and area under curve (AUC) were used to evaluate the accuracy of the prognostic risk model.The nomogram was drawn by using risk score and clinical information.

Immune cell infiltration
CIBERSORT method was used to calculate the proportion of 22 kinds of immune cells.Pearson test was used to analyze the correlation between risk score and immune infiltration, and box plot and scatter plot were used to show the differences between different risk subgroups.Pearson correlation test was used to analyze the expression correlation between model genes and immune checkpoint related genes in LUAD.

Validation of model gene expression
Through HPA database (https://www.proteinatlas.org/),we can analyze the expression of model genes at the protein level.Through CCLE database, we can analyze the expression of model genes in lung adenocarcinoma cell lines.Then, the genetic alteration of model genes in LUAD was explored through cBio-Portal database (https://www.cbioportal.org/).

Prediction of ceRNA network construction
The target miRNA of model gene mRNA was predicted by star-Base database and analyzed by 7 prediction programs (PITA, miRanda, DIANA microT, PicTar, miRmap, RNA22, and TargetScan).Target miRNA screening conditions: programNum ≥ 1, CLIP-Data ≥ 1, pan-Cancer ≥ 1, Degradome-Data ≥ 1.In addition, we further analyzed the correlation between target miRNA and mRNA, and screened miRNAs that were more suitable for ceRNA hypothesis.Based on the miRNA screened by the above methods, upstream lncRNA of miRNA was predicted.The correlation between miRNA and lncRNA was further analyzed in order to screen lncRNA more suitable for ceRNA conditions.By comprehensively analyzing "miRNA-mRNA" and "miRNA-lncRNA," the "lncRNA-miRNA-mRNA" network of LUAD was established.

Immunohistochemistry (IHC) staining
Tissue microarrays were constructed by Shanghai Joly Bio-Tech Co. (Joly Bio-Tech Co., Ltd, Shanghai, China).The tissue array contained 1.5 mm diameter formalin-fixed paraffin-embedded tissue disks containing a total of 32 LUADs and 28 adjacent tissues.Tissue microarrays were serially excised and cut into 3 to 5 μM thick slices.First, xylene was added to deparaffinize, dehydrated with anhydrous ethanol, boiled in citrate buffer for 10 minutes to repair the antigen, and blocked with endogenous peroxidase.Then the primary antibody (anti-ALG3 antibody, diluted at 1:200) was added and incubated at 4 °C overnight.The secondary antibody was added and incubated at 25 °C for 1 hour, rinsed with PBS (Phosphate Buffered Saline), DAB (3,3ʹdiaminobenzidine) color development, hematoxylin counterstaining, gradient ethanol and xylene dehydration, and neutral resin sealing.IHC stained sections were independently analyzed and scored by an experienced pathologist.Percentage of positively stained cells and staining intensity were evaluated microscopically.Staining score: 0 to 1, negative; 1 to 2, weak; 2 to 3, moderate; 3 to 4, strong.

Screening of abnormal metabolic genes
We first designed and described the research process (Fig. 1).After obtaining the gene expression amount of 515 LUAD cases and 59 normal lung tissues, the data were standardized, and then the differential expression analysis was performed on LUAD and normal tissues.A total of 5481 DEGs (P < .05)were screened out (Fig. 2A), of which 42 genes belonged to abnormal metabolism-related genes (Fig. 2B).The prediction of the interaction between metabolism-related genes is shown in the following figure (Fig. 2C).

Enrichment analysis
The top ten biological processes (BP) in the GO functional enrichment results included vitamin metabolic process, vitamin transport, alcohol metabolic process, glutamine family amino acid metabolic process, organic hydroxy compound biosynthetic process, cobalamin metabolic process, alpha-amino acid metabolic process, regulation of alcohol biosynthetic process, water-soluble vitamin metabolic process and alcohol biosynthetic process (Fig. 3A).The chord plot shows the distribution of genes corresponding to BP (Fig. 3B).The top ten cellular components (CC) contain apical plasma membrane, apical part of cell, brush border, brush border membrane, cluster of actinbased cell projections, cell projection membrane and smooth endoplasmic reticulum (Fig. 3C).The chord plot shows the distribution of genes corresponding to CC (Fig. 3D).The top ten molecular functions (MF) include vitamin binding, tetrapyrrole binding, oxidoreductase activity, lipid transfer activity, manganese ion transmembrane transporter activity, cobalamin binding, sodium:phosphate symporter activity, lyase activity, hormone binding, zinc ion transmembrane transporter activity (Fig. 3E).
The chord plot shows the distribution of genes corresponding to MF (Fig. 3F).The KEGG pathway focuses on parathyroid hormone synthesis, secretion and action, vitamin digestion and absorption, steroid hormone biosynthesis, primary bile acid biosynthesis, biosynthesis of amino acids (Fig. 3G).The chord plot shows the distribution of genes corresponding to the KEGG pathway (Fig. 3H).

Construction of prognostic risk model
Univariate Cox analysis of abnormal metabolic genes was carried out with survival package, and 9 prognosis related genes were screened out (Fig. 4A).Multivariate Cox regression analysis was further included to establish a multigene prognosis risk model (Fig. 4B).The risk value of each patient was calculated based on the selected prognostic gene expression multiplied by the sum of multivariate Cox regression coefficients.LUAD patients were sorted by risk score and divided into high-risk subgroup and low-risk subgroup with the median value as the boundary (Fig. 4C).The distribution of survival status shows that the death (B) Extraction of differentially expressed genes related to abnormal metabolism.Among them, 42 metabolic related genes abnormally express.(C) Differentially expressed metabolic protein interaction network.www.md-journal.compopulation is denser in the high-risk group (Fig. 4D).The heat map shows the expression trend of the 5 model genes (Fig. 4E).The survival rate of LUAD patients in the high-risk group in the model was lower (Fig. 4F).The AUC values for the ROC curve assessment risk model to predict 1-and 3-year prognostic efficacy in LUAD patients were 0.706 and 0.682, respectively (Fig. 4G).

Validation of prognostic risk model in the GEO dataset
Risk score of LUAD patients in the GSE50081 dataset was calculated according to the risk score formula.Patients were divided into high-risk subgroup and low-risk subgroup according to the median value (Fig. 5A).The distribution of survival status was similar to that of TCGA data, and death cases were more dense in the high-risk subgroup (Fig. 5B).The heat map shows the expression trend of the 6 model genes (Fig. 5C).The patients in the high-risk subgroup of the model had a poor prognosis (Fig. 5D).The AUC values for the ROC curve assessment risk model to predict 1-and 3-year prognostic efficacy in LUAD patients were 0.730 and 0.755, respectively (Fig. 5E).

Prognostic value of risk model
In order to further evaluate the predictive value of the above risk models, Univariate and multivariate Cox regression analysis were carried out.Univariate analysis showed that the metabolic related gene risk score model could be used as a predictor of the prognosis of LUAD (Fig. 6A).Multivariate analysis showed that the risk score could independently predict the prognosis of LUAD (HR = 2.109, P < .001)(Fig. 6B).The "rms" package is used to plot nomograms to predict LUAD patient survival.Clinical factors and risk score were used to predict survival of LUAD (Fig. 6C).The 3-year actual survival rate of the calibration curve coincides well with the predicted survival rate (Fig. 6D).ROC curve shows that the AUC value of 3-year survival rate was 0.75 (Fig. 6E).

Correlation between immune cell infiltration and risk model
In the study of tumor microenvironment, CIBERSORT method is used to calculate the proportion of immune cell subsets.

Validation of model genes
In this study, the expression of model genes in normal tissues and lung adenocarcinoma tissues was analyzed according to the immunohistochemical results of histological and pathological maps in HPA database.The results showed that the expressions of ALG3, COL7A1 and SLC52A1 were upregulated in LUAD tissues (Fig. 9A to C).In addition, ALG3 and COL7A1 were highly expressed in most LUAD cell lines (Fig. 9D).In addition, based on LUAD sample data in cBio-Portal database, the genetic alteration of 5 model genes was analyzed.Among LUAD patients, the genetic alteration rates of ALG3, COL7A1, KL, MST1, and SLC52A1 were 4%, 4%, 2.5%, 1.4%, and 1.4%, respectively (Fig. 10A).The main types of mutations in model genes include mutation, amplification and deep deletion.Among them, the main mutation type of ALG3 was amplification.The main mutation type that occurs in COL7A1 was mutation.The main types of mutations that occur in KL were mutation and deep deletion.The main types of mutations that occur in MST1 were mutation and mRNA High.The main mutation type of SLC52A1 was deep deletion (Fig. 10B).

Expression of ALG3 in LUAD
Expression of the key model gene ALG3 was validated in LUAD tissues.Tissue microarrays constructed using 31 LUAD and 28 paraneoplastic samples were used and immunohistochemistry was used to validate ALG3 expression in cancer and neighboring cancers.ALG3 was strongly expressed in cancer tissues compared to paraneoplastic tissues (Fig. 13A to B). Paired t-test was used to detect differences in ALG3 expression between cancer and neighboring samples, and ALG3 expression was found to be upregulated in cancer (Fig. 13C).The ROC curve showed that ALG3 had a good diagnostic efficiency for cancer (AUC = 0.741) (Fig. 13D).

Discussion
Many changes have taken place in tumor cells to meet their energy and material needs, and metabolic changes have become an important feature.More and more people pay attention to  the study of metabolic related genes to judge the characteristics and prognosis of tumors. [9]For example, the prognosis model of lung adenocarcinoma based on immuno-glycolysis-related genes has a good prediction effect on the survival of patients. [10]herefore, in this study, abnormal metabolism related genes were used to judge the prognosis of LUAD.In this study, mRNA expression profile data of LUAD patients and control samples were downloaded from TCGA database.Genes related to abnormal metabolism were extracted for further difference analysis.Finally, 42 differential genes related to abnormal metabolism were screened.Five genes related to prognosis were identified.The 5 genes used to construct the prognosis model include ALG3, COL7A1, KL, MST1, and SLC52A1.Previous studies have shown that the above model genes have great predictive ability for the survival of lung adenocarcinoma and other tumors.ALG3 gene encodes a member of the ALG3 (alpha-1,3-mannosyltransferase) family and is associated with glycosylation diseases and protein metabolism.It has been demonstrated that ALG3 is upregulated in NSCLC tissues and cells, and patients with high ALG3 expression have a poorer prognosis.Overexpression of ALG3 promotes the malignant phenotype and EMT process in NSCLC cells. [11]Consistent with previous results, ALG3 expression was upregulated in LUAD tissues and had good diagnostic efficacy for LUAD.Type VII collagen (encoded by the COL7A1 gene) acts as an anchoring protofiber for the basement membrane and contributes to epithelial basement membrane organization and adhesion by interacting with extracellular matrix (ECM) proteins such as type IV collagen.COL7A1 was more densely expressed in gastric cancer tissues and malignant esophageal cancer tissues than in normal tissues, and high COL7A1 expression was associated with tumor infiltration, metastasis, and poor patient prognosis. [12]otho (KL) is a classical senescence suppressor gene.It has been found that Klotho gene polymorphisms are associated with tumor development and growth.As a valuable tumor suppressor gene, the KL protein is expressed at a low rate in various cancerous tissues, including renal, breast, liver, lung, and pancreatic cancers, which elucidates its ability to inhibit tumor cell growth. [13]Mammalian STE20-like kinase 1 (MST1) is a Hepatocyte growth factor-like protein alpha chain and a link in the Hippo pathway.Studies have shown that MST1 exerts tumor suppressor effects by regulating apoptosis, migration and proliferation of colorectal and lung cancer cells. [14]Solute carrier family 52 member 1 (SLC52A1) plays a critical role in the biochemical redox reactions of carbohydrate, lipid, and amino acid metabolism.Deficiency of SLC52A1 directly enhances immunosuppressive activity by facilitating STAT3-mediated reactive oxygen species production, contributing to unfavorable prognostic factors in cancer. [15]Among the above genes, ALG3, COL7A1, MST1, and SLC52A1 were upregulated in LUAD and KL was downregulated in LUAD, and all of them were associated with patient prognosis.ALG3 and COL7A1 were potential risk factors for LUAD and may be potential targets for antitumor therapy.Although the above genes are closely related to the prognosis of LUAD, this study is the first time to use them in combination to predict the survival rate of LUAD.
In this study, 5 prognostic related genes (ALG3, COL7A1, KL, MST1, and SLC52A1) were used to establish a LUAD polygenic prognostic risk model.The distribution of survival status indicates that the death population is more dense in the high-risk population.The prognosis of LUAD patients in the high-risk subgroup is poor.This model has great prediction efficiency for LUAD.The prediction efficiency of the LUAD risk model was also verified in the GSE50081 data set.Cox regression analysis It is well known that immune infiltration is associated with tumor progression, and its function in the development of malignant tumors has been confirmed. [16,17]In our study, we used the CIBERSORT method to calculate the immune cell infiltration in LUAD patients, and analyzed the relationship with the risk model.Results show T-cells CD4 memory activated, T cells regulatory (Tregs), macrophages M0, macrophages M1 and neutrophils were significantly clustered in the high-risk group.B cells naive, T cells CD4 memory resting, monocytes, dendritic cells resting and mast cells resting were clustered in the low-risk subgroup.B cell infiltration is a marker of improved prognosis in lung cancer, and the lower the tumor grade, the higher the proportion of naive B cells. [18]T cells regulatory (Tregs) can inhibit the function of CD8 T cells through self-secreted cytokines and promote tumor progression to a certain extent. [19]he degree of CD4 + T cell memory activation correlated with the malignancy of the tumor. [20]The higher degree of CD4 T-cell memory activation in the high-risk group and the higher degree of CD4T-cell memory quiescence in the low-risk group indicated that the degree of malignancy of LUAD tumors was higher in the high-risk group, which indirectly indicated that the prediction model had a higher risk prediction ability for LUAD.It has been shown that CD68 + HLA-DR + M1-type macrophage enhances tumor cell motility in hepatocellular carcinoma. [21]Exosomes in oral squamous cell carcinoma regulate the conversion of macrophage to M1, which promotes malignant tumor metastasis. [22]The pro-tumorigenic effects of M1 macrophages may be due to inflammatory cytokines.Increased levels of M0-type and M1-type macrophage infiltration predicted high malignancy of LUAD in the high-risk group.In the immune microenvironment, neutrophils can produce a variety of factors that promote tumor development. [23]For example, they can produce growth factors and angiogenic factors to help tumors grow and spread.In some studies, an increase in the number of neutrophils in the tissues of patients with tumors has been associated with a poor prognosis. [24]Increased levels of neutrophil infiltration in the high-risk group were consistent with the risk model's prediction of a poor prognosis for patients with LUAD.Similarly, T cells CD4 memory activated, Macrophages M0, Macrophages M1 and Neutrophils were positively correlated with the risk score of LUAD patients.T cells CD4 memory resting, B cells naive, mast cells resting, dendritic cells resting and Monocyte were negatively correlated with the risk score of LUAD patients.Results of immune infiltration explain the potential mechanism by which LUAD risk model predicts patient prognosis.Inhibiting the regulatory work of cancer promoting immune cells may become a potential therapeutic strategy.
In conclusion, metabolism-related genes obtained by screening can significantly affect the metabolism of LUAD patients, which in turn affects their prognosis.Established prognostic models can help clinicians provide useful references for patients' prognosis, and risk scores are associated with multiple immune cell infiltrations.IHC experiments revealed that the metabolic abnormality-associated gene, ALG3, may be a potential diagnostic and prognostic biomarker for LUAD.However, there are some limitations of this study.Prognostic models need to be further validated in prospective cohorts with larger sample sizes.The ceRNA regulatory network proposed in this study provides a theoretical framework for understanding molecular interactions, but experimental validation is needed to confirm the functional relevance of these interactions in the pathogenesis of LUAD.www.md-journal.com

Figure 1 .
Figure 1.Research designs and process of this study.

Figure 3 .
Figure 3. Functional enrichment analysis of 42 differentially expressed abnormal metabolism related genes.(A) Top 10 BP terms in GO enrichment analysis.The outer circle of the scatterplot shows the logFC identifier of the gene in each term.Red circles indicate up-regulated genes and blue circles indicate downregulated genes.(B) Distribution of genes corresponding to each term in BP. (C) Top 10 CC terms in GO enrichment analysis.(D) Distribution of genes corresponding to each term in CC. (E) Top 10 MF terms in GO enrichment analysis.(F) Distribution of genes corresponding to each term in MF. (G) Top 10 terms in the KEGG pathway.(H) Distribution of genes corresponding to each term in KEGG pathway.

Figure 4 .
Figure 4. Establishment and evaluation of prognostic risk model.(A) Univariate Cox analysis of abnormal metabolism-related genes.(B) Multivariate Cox analysis of abnormal metabolism related genes.(C) Scatter plot of LUAD patient risk scores from low to high.Red dots represent high-risk groups.Blue dots represent the low-risk group.(D) Scatter plot distribution of survival time and survival status corresponding to risk scores of high and low-risk groups.(E) Heat map of expression trend of 6 model genes in LUAD patients.The expression was gradually increased from blue to red.(F) Comparison of survival rate between high and low-risk groups.(G) ROC curves to assess the effectiveness of risk models in predicting 1 -and 3-year outcomes.

Figure 5 .
Figure 5. Validation of prognostic risk model in GEO database.(A) scatter plot of patient risk scores from low to high.Red dots represent high-risk groups.Blue dots represent the low-risk group.(B) scatter plot distribution of survival time and survival status.Red represents death cases and blue represents living cases.(C) Heat map of gene expression trends in 6 models.Expression is gradually increased from blue to red.(D) Kaplan-Meier survival curves for OS in high-risk and low-risk groups.(G) ROC curves to assess the effectiveness of risk models in predicting 1 -and 3-year outcomes.

Figure 6 .
Figure 6.Prediction of prognosis of LUAD patients at risk model.(A) Univariate prognostic analysis.(B) Multivariate independent prognostic analysis.(C) Nomogram of the prognostic model.(D) Calibration charts tested the accuracy of the model in predicting 3-year prognosis.(E) ROC curve tests the accuracy of the model to predict the 3-year survival state.

Figure 7 .
Figure 7. Difference of immune cell infiltration in different risk subgroups.(A) Histogram of the proportion of 22 immune cells in LUAD tissue.Each cell type is represented by a different color.Each column represents an LUAD sample ID. (B) Box plot shows differences in immune cell infiltration between high and low risk groups.

Figure 8 .
Figure 8. Correlation between risk score and immune cell infiltration level.R value stands for correlation coefficient, and positive number stands for positive correlation.

Figure 9 .
Figure 9. Expression verification of model genes.(A) Expression of ALG3 protein in cancer and normal control tissues.(B) Expression of MST1 protein in cancer and normal control tissues.(C) Expression of COL7A1 protein in cancer and normal control tissues.(D) Expression levels of model genes (ALG3, COL7A1, MST1, and SLC52A1) in LUAD cell lines.

Figure 10 .
Figure 10.Model gene mutation analysis.(A) Mutation of model genes in clinical cases.(B) Percentage of mutation types in model genes.